Search CORE

29 research outputs found

Unsupervised Training for 3D Morphable Model Regression

Author: Cole Forrester
Freeman William T.
Genova Kyle
Maschinot Aaron
Sarna Aaron
Vlasic Daniel
Publication venue
Publication date: 15/06/2018
Field of study

We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objectives: a batch distribution loss that encourages the output distribution to match the distribution of the morphable model, a loopback loss that ensures the network can correctly reinterpret its own output, and a multi-view identity loss that compares the features of the predicted 3D face and the input photograph from multiple viewing angles. We train a regression network using these objectives, a set of unlabeled photographs, and the morphable model itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material (http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Reconstruction and analysis of dynamic shapes

Author: Vlasic Daniel, 1979-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 122-141).Motion capture has revolutionized entertainment and influenced fields as diverse as the arts, sports, and medicine. This is despite the limitation that it tracks only a small set of surface points. On the other hand, 3D scanning techniques digitize complete surfaces of static objects, but are not applicable to moving shapes. I present methods that overcome both limitations, and can obtain the moving geometry of dynamic shapes (such as people and clothes in motion) and analyze it in order to advance computer animation. Further understanding of dynamic shapes will enable various industries to enhance virtual characters, advance robot locomotion, improve sports performance, and aid in medical rehabilitation, thus directly affecting our daily lives. My methods efficiently recover much of the expressiveness of dynamic shapes from the silhouettes alone. Furthermore, the reconstruction quality is greatly improved by including surface orientations (normals). In order to make reconstruction more practical, I strive to capture dynamic shapes in their natural environment, which I do by using hybrid inertial and acoustic sensors. After capture, the reconstructed dynamic shapes are analyzed in order to enhance their utility. My algorithms then allow animators to generate novel motions, such as transferring facial performances from one actor onto another using multi-linear models. The presented research provides some of the first and most accurate reconstructions of complex moving surfaces, and is among the few approaches that establish a relationship between different dynamic shapes.by Daniel Vlasic.Ph.D

DSpace@MIT

Recommended from our members

Face Transfer with Multilinear Models

Author: Brand Matthew
Pfister Hanspeter
Popovic Jovan
Vlasic Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/06/2010
Field of study

Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage. The underlying face model automatically adjusts for how the target performs facial expressions and visemes. The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target---the attributes are separably controllable. This supports a wide variety of video rewrite and puppetry applications.Face Transfer is based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes (e.g., identity, expression, and viseme). Separability means that each of these attributes can be independently varied. A multilinear model can be estimated from a Cartesian product of examples (identities x expressions x visemes) with techniques from statistical analysis, but only after careful preprocessing of the geometric data set to secure one-to-one correspondence, to minimize cross-coupling artifacts, and to fill in any missing examples. Face Transfer offers new solutions to these problems and links the estimated model with a face-tracking algorithm to extract pose, expression, and viseme parameters.Engineering and Applied Science

Harvard University - DASH

Recommended from our members

Opacity Light Fields: Interactive Rendering of Surface Light Fields with View-Dependent Opacity

Author: Grzeszczuk Radek
Matusik Wojciech
Molinov Sergey
Pfister Hanspeter
Vlasic Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/06/2010
Field of study

We present new hardware-accelerated techniques for rendering surface light fields with opacity hulls that allow for interactive visualization of objects that have complex reflectance properties and elaborate geometrical details. The opacity hull is a shape enclosing the object with view-dependent opacity parameterized onto that shape. We call the combination of opacity hulls and surface light fields the opacity light field. Opacity light fields are ideally suited for rendering of the visually complex objects and scenes obtained with 3D photography. We show how to implement opacity light fields in the framework of three surface light field rendering methods: view-dependent texture mapping, unstructured lumigraph rendering, and light field mapping. The modified algorithms can be effectively supported on modern graphics hardware. Our results show that all three implementations are able to achieve interactive or real-time frame rates.Engineering and Applied Science

Harvard University - DASH

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Author: Alabdulmohsin Ibrahim
Beyer Lucas
Chen Xi
Goodman Sebastian
Keysers Daniel
Kolesnikov Alexander
Mustafa Basil
Padlewski Piotr
Pavetic Filip
Rong Keran
Salz Daniel
Soricut Radu
Vlasic Daniel
Voigtlaender Paul
Wang Xiao
Wu Jialin
Xiong Xi
Yu Tianli
Zhai Xiaohua
Publication venue
Publication date: 17/10/2023
Field of study

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while slightly underperforming on standard image classification benchmarks, SigLIP-based PaLI shows superior performance across various multimodal benchmarks, especially on localization and visually-situated text understanding. We scale the SigLIP image encoder up to 2 billion parameters, and achieves a new state-of-the-art on multilingual cross-modal retrieval. We hope that PaLI-3, at only 5B parameters, rekindles research on fundamental pieces of complex VLMs, and could fuel a new generation of scaled-up models

arXiv.org e-Print Archive

Video face replacement

Author: Daniel Vlasic
DeCarlo D.
Essa I.
Everingham M.
Hanspeter Pfister
Jones A.
Kalyan Sunkavalli
Kemelmacher-Shlizerman I.
Kevin Dale
Micah K. Johnson
Pighin F. H.
Robertson B.
Viola P. A.
Wojciech Matusik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

We present a method for replacing facial performances in video. Our approach accounts for differences in identity, visual appearance, speech, and timing between source and target videos. Unlike prior work, it does not require substantial manual operation or complex acquisition hardware, only single-camera video. We use a 3D multilinear model to track the facial performance in both videos. Using the corresponding 3D geometry, we warp the source to the target face and retime the source to match the target performance. We then compute an optimal seam through the video volume that maintains temporal consistency in the final composite. We showcase the use of our method on a variety of examples and present the result of a user study that suggests our results are difficult to distinguish from real video footage.National Science Foundation (U.S.) (Grant PHY-0835713)National Science Foundation (U.S.) (Grant DMS-0739255

DSpace@MIT

Crossref

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Author: Araujo André
Engelhardt Andreas
Ferrari Vittorio
Jampani Varun
Karpur Arjun
Li Yuanzhen
Liu Ce
Makadia Ameesh
Maninis Kevis-Kokitsi
Martin-Brualla Ricardo
Patel Kaushal
Popov Stefan
Sargent Kyle
Truong Karen
Vlasic Daniel
Zhou Howard
Publication venue
Publication date: 13/10/2023
Field of study

Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.ioComment: NeurIPS 2023 camera ready. Project page: https://navidataset.github.i

arXiv.org e-Print Archive

State of the Art Report on Video-based Graphics and Video Visualizations

Author: Agarwal
Agarwal
Agarwala
Aggarwal
Ahonen
Andriluka
Arulampalam
Assa
Assa
Avidan
Bai
Ballan
Barnes
Barron
Bartoli
Bay
Bennett
Bhat
Bishop
Botchen
Bousseau
Boykov
Brandel
Bruhn
Brutzer
Buehler
Caspi
Chen
Cheng
Collomosse
Cornelis
Correa
Coughlan
Cremers
Dalal
Daniel
Davison
Dellaert
Deutscher
Divvala
Dollar
Durou
Faugeras
Felzenszwalb
Felzenszwalb
Felzenszwalb
Fleet
Furukawa
Gall
Galvin
Gibson
Goldman
Hannuna
Harris
Hartley
Hoiem
Horn
Hu
Huang
Höferlin
Kakumanu
Kang
Kang
Ke
Kimber
Klein
Koutsourakis
Kumar
Kutulakos
Kwatra
Laptev
Laptev
Laurentini
Le
Lee
Li
Lindeberg
Liu
Lobay
Lowe
Lucas
Matas
McIvor
Mei
Mikolajczyk
Mikolajczyk
Moons
Moreels
Nienhaus
Patel
Peker
Pellegrini
Petrovic
Piccardi
Pritch
Radke
Ramanan
Rav-Acha
Rav-Acha
Rav-Acha
Reisfeld
Romdhani
Rother
Rubinstein
Rubinstein
Rubinstein
Russell
Schoeffmann
Seitz
Setlur
Setlur
Sezgin
Shesh
Shi
Sion
Starck
Stein
Stoykova
Sull
Sun
Szeliski
Szeliski
Teodosio
Torresani
Torresani
Truong
Urtasun
Van
Viola
Vlasic
Vogiatzis
Wang
Wang
Wang
Wang
Wang
Wang
Weickert
Welch
Wilson
Winnemöller
Wolf
Xu
Yeung
Zhao
Zhu
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Crossref

Cronfa at Swansea University

Temporally coherent completion of dynamic shapes

Author: Ahmed N.
Brox T.
Chuang M.
Daniel Vlasic
Davis J.
de Aguiar E.
Gelfand N.
Hao Li
Held M.
Jovan Popović
Kazhdan M.
Kojekine N.
Liepa P.
Linjie Luo
Mark Pauly
Mitra N. J.
Pieter Peers
Starck J.
Szymon Rusinkiewicz
Süssmuth J.
Wand M.
Weise T.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref